Overview

Dataset statistics

Number of variables14
Number of observations4605985
Missing cells14001623
Missing cells (%)21.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory527.1 MiB
Average record size in memory120.0 B

Variable types

Numeric8
Categorical6

Alerts

ID has a high cardinality: 4605985 distinct values High cardinality
date has a high cardinality: 552 distinct values High cardinality
SKU is highly correlated with brandId and 1 other fieldsHigh correlation
brandId is highly correlated with SKUHigh correlation
lagerUnitQuantity is highly correlated with SKU and 1 other fieldsHigh correlation
countryOfOrigin is highly correlated with lagerUnitQuantity and 1 other fieldsHigh correlation
price is highly correlated with countryOfOriginHigh correlation
SKU is highly correlated with lagerUnitQuantityHigh correlation
brandId is highly correlated with lagerUnitQuantityHigh correlation
lagerUnitQuantity is highly correlated with SKU and 1 other fieldsHigh correlation
countryOfOrigin is highly correlated with priceHigh correlation
price is highly correlated with countryOfOriginHigh correlation
lagerUnitQuantity is highly correlated with countryOfOriginHigh correlation
countryOfOrigin is highly correlated with lagerUnitQuantityHigh correlation
Type is highly correlated with Group and 1 other fieldsHigh correlation
Group is highly correlated with Type and 1 other fieldsHigh correlation
Category is highly correlated with Type and 1 other fieldsHigh correlation
SKU is highly correlated with Category and 6 other fieldsHigh correlation
Category is highly correlated with SKU and 6 other fieldsHigh correlation
Type is highly correlated with SKU and 6 other fieldsHigh correlation
brandId is highly correlated with SKU and 6 other fieldsHigh correlation
lagerUnitQuantity is highly correlated with SKU and 5 other fieldsHigh correlation
trademark is highly correlated with SKU and 6 other fieldsHigh correlation
countryOfOrigin is highly correlated with SKU and 5 other fieldsHigh correlation
Group is highly correlated with SKU and 6 other fieldsHigh correlation
geoCluster is highly correlated with cityIdHigh correlation
cityId is highly correlated with geoClusterHigh correlation
Category has 484142 (10.5%) missing values Missing
Type has 484142 (10.5%) missing values Missing
brandId has 1990776 (43.2%) missing values Missing
trademark has 1057713 (23.0%) missing values Missing
countryOfOrigin has 2329612 (50.6%) missing values Missing
price has 3827619 (83.1%) missing values Missing
sales has 3827619 (83.1%) missing values Missing
ID is uniformly distributed Uniform
ID has unique values Unique

Reproduction

Analysis started2022-04-29 23:48:43.007916
Analysis finished2022-04-29 23:59:47.544693
Duration11 minutes and 4.54 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

SKU
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct60
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean367135.311
Minimum24
Maximum838137
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size70.3 MiB
2022-04-30T01:59:47.898851image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile1008
Q139465
median363713
Q3642700
95-th percentile819149
Maximum838137
Range838113
Interquartile range (IQR)603235

Descriptive statistics

Standard deviation300513.5005
Coefficient of variation (CV)0.8185360861
Kurtosis-1.600911761
Mean367135.311
Median Absolute Deviation (MAD)314263
Skewness0.1250106079
Sum1.691019735 × 1012
Variance9.0308364 × 1010
MonotonicityIncreasing
2022-04-30T01:59:48.306329image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
32485127987
 
2.8%
32550115534
 
2.5%
32546114501
 
2.5%
32490112642
 
2.4%
819149106667
 
2.3%
47330103272
 
2.2%
144184100930
 
2.2%
736360100523
 
2.2%
78713398721
 
2.1%
2619498387
 
2.1%
Other values (50)3526821
76.6%
ValueCountFrequency (%)
2473388
1.6%
20884022
1.8%
100878041
1.7%
1664996990
2.1%
2087284485
1.8%
2619498387
2.1%
32485127987
2.8%
32490112642
2.4%
32546114501
2.5%
3254990627
2.0%
ValueCountFrequency (%)
83813743229
0.9%
81915097955
2.1%
819149106667
2.3%
81538163361
1.4%
80238289149
1.9%
78713398721
2.1%
78278795316
2.1%
736360100523
2.2%
73635780129
1.7%
71183885436
1.9%

Category
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct39
Distinct (%)< 0.1%
Missing484142
Missing (%)10.5%
Memory size70.3 MiB
Semi-hard coarse-pored cheese
381556 
Avocado
 
254592
Water, sparkling
 
235834
Water, still
 
230848
Water, import, sparkling
 
206376
Other values (34)
2812637 

Length

Max length58
Median length21
Mean length22.94336563
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPomegranate
2nd rowPomegranate
3rd rowPomegranate
4th rowPomegranate
5th rowPomegranate

Common Values

ValueCountFrequency (%)
Semi-hard coarse-pored cheese381556
 
8.3%
Avocado254592
 
5.5%
Water, sparkling235834
 
5.1%
Water, still230848
 
5.0%
Water, import, sparkling206376
 
4.5%
Yoghurts180652
 
3.9%
Grapefruit147906
 
3.2%
Small fancy bread with berry stuff, own production142335
 
3.1%
Banana127987
 
2.8%
Lemon115534
 
2.5%
Other values (29)2098223
45.6%
(Missing)484142
 
10.5%

Length

2022-04-30T01:59:48.749838image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
production1003719
 
7.3%
own1003719
 
7.3%
water766911
 
5.6%
bread647997
 
4.7%
cheese622851
 
4.5%
plain570528
 
4.2%
sparkling536063
 
3.9%
semi-hard528902
 
3.9%
in460838
 
3.4%
the460838
 
3.4%
Other values (49)7100746
51.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Type
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct38
Distinct (%)< 0.1%
Missing484142
Missing (%)10.5%
Memory size70.3 MiB
semi-hard cheese — More expensive — National — Available import
436688 
Fancy bread — small — sweet
 
214162
Yoghurt — firm —plain
 
180652
Bread — plain — wheat
 
177911
Table water — PET — from 1 to 2 L — Still
 
168696
Other values (33)
2943734 

Length

Max length63
Median length28
Mean length32.80180929
Min length7

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTropical fruit — Pomegranate— Plain
2nd rowTropical fruit — Pomegranate— Plain
3rd rowTropical fruit — Pomegranate— Plain
4th rowTropical fruit — Pomegranate— Plain
5th rowTropical fruit — Pomegranate— Plain

Common Values

ValueCountFrequency (%)
semi-hard cheese — More expensive — National — Available import436688
 
9.5%
Fancy bread — small — sweet214162
 
4.6%
Yoghurt — firm —plain180652
 
3.9%
Bread — plain — wheat177911
 
3.9%
Table water — PET — from 1 to 2 L — Still168696
 
3.7%
Therapeutic-table water — PET — from 1 to 2 L168507
 
3.7%
Baguette loaf — plain158793
 
3.4%
Tropical fruit — Avocado — Ready to Eat141950
 
3.1%
Bread — flavored140966
 
3.1%
Therapeutic-table water — Import — PET128335
 
2.8%
Other values (28)2205183
47.9%
(Missing)484142
 
10.5%

Length

2022-04-30T01:59:49.141166image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
7426274
27.9%
plain1171483
 
4.4%
fruit1052589
 
4.0%
bread825362
 
3.1%
water766911
 
2.9%
import737013
 
2.8%
to721031
 
2.7%
pet688870
 
2.6%
cheese622851
 
2.3%
more592789
 
2.2%
Other values (64)12031684
45.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

brandId
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct14
Distinct (%)< 0.1%
Missing1990776
Missing (%)43.2%
Infinite0
Infinite (%)0.0%
Mean3663.891803
Minimum967
Maximum8314
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size70.3 MiB
2022-04-30T01:59:49.463600image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum967
5-th percentile1241
Q11330
median2737
Q37358
95-th percentile8276
Maximum8314
Range7347
Interquartile range (IQR)6028

Descriptive statistics

Standard deviation2626.027328
Coefficient of variation (CV)0.7167316802
Kurtosis-0.8912058651
Mean3663.891803
Median Absolute Deviation (MAD)1407
Skewness0.8785632628
Sum9581842817
Variance6896019.525
MonotonicityNot monotonic
2022-04-30T01:59:49.741618image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
2737792182
 
17.2%
1241413215
 
9.0%
7358204622
 
4.4%
1330201185
 
4.4%
8276180652
 
3.9%
8274158221
 
3.4%
159495316
 
2.1%
96795055
 
2.1%
741892214
 
2.0%
378489149
 
1.9%
Other values (4)293398
 
6.4%
(Missing)1990776
43.2%
ValueCountFrequency (%)
96795055
 
2.1%
1241413215
9.0%
1330201185
 
4.4%
159495316
 
2.1%
231380698
 
1.8%
269384485
 
1.8%
272475479
 
1.6%
2737792182
17.2%
378489149
 
1.9%
7358204622
 
4.4%
ValueCountFrequency (%)
831452736
 
1.1%
8276180652
 
3.9%
8274158221
 
3.4%
741892214
 
2.0%
7358204622
 
4.4%
378489149
 
1.9%
2737792182
17.2%
272475479
 
1.6%
269384485
 
1.8%
231380698
 
1.8%

lagerUnitQuantity
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean115.6666998
Minimum0.5
Maximum550
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size70.3 MiB
2022-04-30T01:59:50.102348image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.5
5-th percentile1
Q11
median1.5
Q3300
95-th percentile400
Maximum550
Range549.5
Interquartile range (IQR)299

Descriptive statistics

Standard deviation160.2828934
Coefficient of variation (CV)1.38573067
Kurtosis-0.7569635804
Mean115.6666998
Median Absolute Deviation (MAD)0.5
Skewness0.9306525846
Sum532759084.2
Variance25690.60591
MonotonicityNot monotonic
2022-04-30T01:59:50.381429image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
11927060
41.8%
350575046
 
12.5%
1.5359350
 
7.8%
2237754
 
5.2%
400209846
 
4.6%
300208752
 
4.5%
70141502
 
3.1%
250127826
 
2.8%
30595055
 
2.1%
34082448
 
1.8%
Other values (10)641346
 
13.9%
ValueCountFrequency (%)
0.578041
 
1.7%
0.7560309
 
1.3%
11927060
41.8%
1.5359350
 
7.8%
2237754
 
5.2%
662152
 
1.3%
2875479
 
1.6%
5052736
 
1.1%
70141502
 
3.1%
8067629
 
1.5%
ValueCountFrequency (%)
55043946
 
1.0%
50048529
 
1.1%
400209846
 
4.6%
350575046
12.5%
34082448
 
1.8%
30595055
 
2.1%
300208752
 
4.5%
250127826
 
2.8%
20071827
 
1.6%
12080698
 
1.8%

trademark
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct23
Distinct (%)< 0.1%
Missing1057713
Missing (%)23.0%
Infinite0
Infinite (%)0.0%
Mean6290.387812
Minimum297
Maximum15156
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size70.3 MiB
2022-04-30T01:59:50.705088image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum297
5-th percentile1037
Q11921
median5070
Q39666
95-th percentile15156
Maximum15156
Range14859
Interquartile range (IQR)7745

Descriptive statistics

Standard deviation4434.391018
Coefficient of variation (CV)0.7049471592
Kurtosis-0.8683282276
Mean6290.387812
Median Absolute Deviation (MAD)3897
Skewness0.4643881015
Sum2.232000694 × 1010
Variance19663823.7
MonotonicityNot monotonic
2022-04-30T01:59:50.994668image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
9666743919
16.2%
15156324328
 
7.0%
6133204622
 
4.4%
2781201185
 
4.4%
9974193519
 
4.2%
1323190843
 
4.1%
5070180652
 
3.9%
1037140452
 
3.0%
4970138350
 
3.0%
297103272
 
2.2%
Other values (13)1127130
24.5%
(Missing)1057713
23.0%
ValueCountFrequency (%)
297103272
2.2%
1037140452
3.0%
112095055
2.1%
117395316
2.1%
119292214
2.0%
1323190843
4.1%
183575479
 
1.6%
183968026
 
1.5%
192180698
1.8%
2781201185
4.4%
ValueCountFrequency (%)
15156324328
7.0%
1196094577
 
2.1%
9974193519
 
4.2%
9666743919
16.2%
880293949
 
2.0%
6133204622
 
4.4%
5070180652
 
3.9%
4970138350
 
3.0%
489198387
 
2.1%
438484485
 
1.8%

countryOfOrigin
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct6
Distinct (%)< 0.1%
Missing2329612
Missing (%)50.6%
Infinite0
Infinite (%)0.0%
Mean7.333039884
Minimum1
Maximum37
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size70.3 MiB
2022-04-30T01:59:51.322088image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q314
95-th percentile33
Maximum37
Range36
Interquartile range (IQR)13

Descriptive statistics

Standard deviation10.0943899
Coefficient of variation (CV)1.376562798
Kurtosis1.641600802
Mean7.333039884
Median Absolute Deviation (MAD)0
Skewness1.606336295
Sum16692734
Variance101.8967075
MonotonicityNot monotonic
2022-04-30T01:59:51.591360image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
11429345
31.0%
16296236
 
6.4%
14206376
 
4.5%
6146685
 
3.2%
33140452
 
3.0%
3757279
 
1.2%
(Missing)2329612
50.6%
ValueCountFrequency (%)
11429345
31.0%
6146685
 
3.2%
14206376
 
4.5%
16296236
 
6.4%
33140452
 
3.0%
3757279
 
1.2%
ValueCountFrequency (%)
3757279
 
1.2%
33140452
 
3.0%
16296236
 
6.4%
14206376
 
4.5%
6146685
 
3.2%
11429345
31.0%

Group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size70.3 MiB
Bakery
1214676 
Tropical fruits
1180576 
Yogurts
820971 
Mineral water
766911 
Cheese
622851 

Length

Max length15
Median length7
Mean length9.650583317
Min length6

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTropical fruits
2nd rowTropical fruits
3rd rowTropical fruits
4th rowTropical fruits
5th rowTropical fruits

Common Values

ValueCountFrequency (%)
Bakery1214676
26.4%
Tropical fruits1180576
25.6%
Yogurts820971
17.8%
Mineral water766911
16.7%
Cheese622851
13.5%

Length

2022-04-30T01:59:51.952505image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-30T01:59:52.173962image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
bakery1214676
18.5%
fruits1180576
18.0%
tropical1180576
18.0%
yogurts820971
12.5%
water766911
11.7%
mineral766911
11.7%
cheese622851
9.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

geoCluster
Real number (ℝ≥0)

HIGH CORRELATION

Distinct446
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2220.030811
Minimum92
Maximum3230
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size70.3 MiB
2022-04-30T01:59:52.524848image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum92
5-th percentile1998
Q12049
median2158
Q32269
95-th percentile2735
Maximum3230
Range3138
Interquartile range (IQR)220

Descriptive statistics

Standard deviation245.9249253
Coefficient of variation (CV)0.110775456
Kurtosis6.275564569
Mean2220.030811
Median Absolute Deviation (MAD)110
Skewness0.5653726135
Sum1.022542861 × 1010
Variance60479.06888
MonotonicityNot monotonic
2022-04-30T01:59:52.907688image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
204326383
 
0.6%
202226076
 
0.6%
193525763
 
0.6%
206925680
 
0.6%
258025634
 
0.6%
199825391
 
0.6%
206425256
 
0.5%
204925201
 
0.5%
199125131
 
0.5%
206024772
 
0.5%
Other values (436)4350698
94.5%
ValueCountFrequency (%)
9237
 
< 0.1%
112149
 
< 0.1%
1132
 
< 0.1%
11762
 
< 0.1%
131105
 
< 0.1%
1481132
< 0.1%
16212
 
< 0.1%
189239
 
< 0.1%
19948
 
< 0.1%
260203
 
< 0.1%
ValueCountFrequency (%)
3230221
 
< 0.1%
32094332
0.1%
319629
 
< 0.1%
317933
 
< 0.1%
317751
 
< 0.1%
31752
 
< 0.1%
317246
 
< 0.1%
31681215
 
< 0.1%
316613
 
< 0.1%
316417
 
< 0.1%

cityId
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size70.3 MiB
0
4598463 
1
 
7456
25
 
66

Length

Max length2
Median length1
Mean length1.000014329
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
04598463
99.8%
17456
 
0.2%
2566
 
< 0.1%

Length

2022-04-30T01:59:54.516960image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-30T01:59:54.733293image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
04598463
99.8%
17456
 
0.2%
2566
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ID
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct4605985
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size70.3 MiB
RR34354985
 
1
RR54855593
 
1
RR44792073
 
1
RR34027726
 
1
RR42724877
 
1
Other values (4605980)
4605980 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4605985 ?
Unique (%)100.0%

Sample

1st rowRR27958444
2nd rowRR27958445
3rd rowRR27958446
4th rowRR27958447
5th rowRR27958448

Common Values

ValueCountFrequency (%)
RR343549851
 
< 0.1%
RR548555931
 
< 0.1%
RR447920731
 
< 0.1%
RR340277261
 
< 0.1%
RR427248771
 
< 0.1%
RR461256851
 
< 0.1%
RR290873731
 
< 0.1%
RR526081691
 
< 0.1%
RR510170881
 
< 0.1%
RR524975041
 
< 0.1%
Other values (4605975)4605975
> 99.9%

Length

2022-04-30T01:59:55.522149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
rr522455141
 
< 0.1%
rr282390981
 
< 0.1%
rr290479791
 
< 0.1%
rr342284191
 
< 0.1%
rr306636271
 
< 0.1%
rr518441551
 
< 0.1%
rr363661751
 
< 0.1%
rr446679921
 
< 0.1%
rr449899151
 
< 0.1%
rr525919391
 
< 0.1%
Other values (4605975)4605975
> 99.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

date
Categorical

HIGH CARDINALITY

Distinct552
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size70.3 MiB
2021-07-05
 
14361
2021-07-04
 
14347
2021-07-03
 
14336
2021-07-02
 
14318
2021-07-01
 
14291
Other values (547)
4534332 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021-03-03
2nd row2021-03-04
3rd row2021-03-05
4th row2021-03-06
5th row2021-03-07

Common Values

ValueCountFrequency (%)
2021-07-0514361
 
0.3%
2021-07-0414347
 
0.3%
2021-07-0314336
 
0.3%
2021-07-0214318
 
0.3%
2021-07-0114291
 
0.3%
2021-06-3014260
 
0.3%
2021-06-2914227
 
0.3%
2021-06-2814205
 
0.3%
2021-06-2714192
 
0.3%
2021-06-2614180
 
0.3%
Other values (542)4463268
96.9%

Length

2022-04-30T01:59:55.934669image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2021-07-0514361
 
0.3%
2021-07-0414347
 
0.3%
2021-07-0314336
 
0.3%
2021-07-0214318
 
0.3%
2021-07-0114291
 
0.3%
2021-06-3014260
 
0.3%
2021-06-2914227
 
0.3%
2021-06-2814205
 
0.3%
2021-06-2714192
 
0.3%
2021-06-2614180
 
0.3%
Other values (542)4463268
96.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

price
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct1967
Distinct (%)0.3%
Missing3827619
Missing (%)83.1%
Infinite0
Infinite (%)0.0%
Mean67.4906662
Minimum0
Maximum7246.89
Zeros655
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size70.3 MiB
2022-04-30T01:59:56.311878image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile15.69
Q124.19
median38.49
Q361.89
95-th percentile282.69
Maximum7246.89
Range7246.89
Interquartile range (IQR)37.7

Descriptive statistics

Standard deviation85.63161109
Coefficient of variation (CV)1.268791907
Kurtosis100.9603377
Mean67.4906662
Median Absolute Deviation (MAD)17.4
Skewness3.835564146
Sum52532439.89
Variance7332.772818
MonotonicityNot monotonic
2022-04-30T01:59:56.778476image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28.4930740
 
0.7%
38.3918033
 
0.4%
36.9916637
 
0.4%
17.0915681
 
0.3%
42.6915442
 
0.3%
27.0914784
 
0.3%
19.9914006
 
0.3%
18.7913968
 
0.3%
35.5913946
 
0.3%
424.6912444
 
0.3%
Other values (1957)612685
 
13.3%
(Missing)3827619
83.1%
ValueCountFrequency (%)
0655
< 0.1%
0.0914
 
< 0.1%
0.196
 
< 0.1%
0.991
 
< 0.1%
1.092
 
< 0.1%
2.794
 
< 0.1%
2.891
 
< 0.1%
3.291
 
< 0.1%
3.692
 
< 0.1%
3.794
 
< 0.1%
ValueCountFrequency (%)
7246.891
< 0.1%
6049.291
< 0.1%
1817.691
< 0.1%
1675.691
< 0.1%
1585.791
< 0.1%
1060.391
< 0.1%
908.891
< 0.1%
639.091
< 0.1%
511.291
< 0.1%
482.891
< 0.1%

sales
Real number (ℝ≥0)

MISSING

Distinct840
Distinct (%)0.1%
Missing3827619
Missing (%)83.1%
Infinite0
Infinite (%)0.0%
Mean3.593747787
Minimum0.001
Maximum801
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size70.3 MiB
2022-04-30T01:59:57.265515image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.001
5-th percentile0.4
Q11
median2.5
Q34
95-th percentile10.2
Maximum801
Range800.999
Interquartile range (IQR)3

Descriptive statistics

Standard deviation4.627157516
Coefficient of variation (CV)1.287557667
Kurtosis1492.570332
Mean3.593747787
Median Absolute Deviation (MAD)1.5
Skewness15.01689529
Sum2797251.09
Variance21.41058667
MonotonicityNot monotonic
2022-04-30T01:59:57.662708image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2127192
 
2.8%
3107036
 
2.3%
185388
 
1.9%
471411
 
1.6%
542861
 
0.9%
628168
 
0.6%
0.523221
 
0.5%
0.422514
 
0.5%
0.321828
 
0.5%
718382
 
0.4%
Other values (830)230365
 
5.0%
(Missing)3827619
83.1%
ValueCountFrequency (%)
0.00111
 
< 0.1%
0.0081
 
< 0.1%
0.0371
 
< 0.1%
0.071
 
< 0.1%
0.0781
 
< 0.1%
0.0861
 
< 0.1%
0.0911
 
< 0.1%
0.13765
0.1%
0.1013
 
< 0.1%
0.1041
 
< 0.1%
ValueCountFrequency (%)
8011
< 0.1%
4891
< 0.1%
4211
< 0.1%
309.41
< 0.1%
3021
< 0.1%
2431
< 0.1%
220.791
< 0.1%
217.51
< 0.1%
1701
< 0.1%
1581
< 0.1%

Interactions

2022-04-30T01:58:54.046097image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:56:46.550590image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:11.724341image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:28.700598image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:49.290784image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:06.792740image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:21.348483image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:46.050974image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:54.826561image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:56:49.261307image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:14.131569image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:31.164393image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:51.554605image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:08.358108image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:23.733611image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:46.858548image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:55.942744image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:56:53.359430image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:16.646038image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:35.001632image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:54.670561image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:10.331542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:27.496281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:47.923624image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:56.906710image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:56:56.563601image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:19.074046image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:37.995188image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:57.579552image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:12.192869image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:30.647315image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:48.828086image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:57.699125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:56:58.759371image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:20.682610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:40.127136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:59.553241image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:14.105934image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:32.640475image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:49.659344image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:58.778455image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:05.049067image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:23.195480image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:43.947499image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:02.800655image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:16.099532image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:36.389378image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:50.807642image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:59.848531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:07.171546image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:23.984419image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:45.138120image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:03.735613image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:16.886192image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:43.822171image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:51.858331image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:59:00.959734image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:09.284722image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:24.809374image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:57:46.210324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:04.753900image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:17.671696image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:44.866976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-04-30T01:58:52.932457image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2022-04-30T01:59:58.047816image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-04-30T01:59:58.599718image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-04-30T01:59:59.161219image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-04-30T01:59:59.785750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-04-30T02:00:00.211924image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-04-30T01:59:06.034078image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-04-30T01:59:14.428105image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-04-30T01:59:32.813782image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-04-30T01:59:37.463317image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

SKUCategoryTypebrandIdlagerUnitQuantitytrademarkcountryOfOriginGroupgeoClustercityIdIDdatepricesales
024PomegranateTropical fruit — Pomegranate— PlainNaN1.0NaNNaNTropical fruits3231RR279584442021-03-0379.490.4
124PomegranateTropical fruit — Pomegranate— PlainNaN1.0NaNNaNTropical fruits3231RR279584452021-03-04NaNNaN
224PomegranateTropical fruit — Pomegranate— PlainNaN1.0NaNNaNTropical fruits3231RR279584462021-03-05NaNNaN
324PomegranateTropical fruit — Pomegranate— PlainNaN1.0NaNNaNTropical fruits3231RR279584472021-03-06NaNNaN
424PomegranateTropical fruit — Pomegranate— PlainNaN1.0NaNNaNTropical fruits3231RR279584482021-03-07NaNNaN
524PomegranateTropical fruit — Pomegranate— PlainNaN1.0NaNNaNTropical fruits3231RR279584492021-03-08NaNNaN
624PomegranateTropical fruit — Pomegranate— PlainNaN1.0NaNNaNTropical fruits3231RR279584502021-03-09NaNNaN
724PomegranateTropical fruit — Pomegranate— PlainNaN1.0NaNNaNTropical fruits3231RR279584512021-03-10NaNNaN
824PomegranateTropical fruit — Pomegranate— PlainNaN1.0NaNNaNTropical fruits3231RR279584522021-03-11NaNNaN
924PomegranateTropical fruit — Pomegranate— PlainNaN1.0NaNNaNTropical fruits3231RR279584532021-03-12NaNNaN

Last rows

SKUCategoryTypebrandIdlagerUnitQuantitytrademarkcountryOfOriginGroupgeoClustercityIdIDdatepricesales
4605975838137AvocadoTropical fruit — Avocado — Ready to EatNaN1.09666.0NaNTropical fruits32090RR313188192021-07-0239.7910.0
4605976838137AvocadoTropical fruit — Avocado — Ready to EatNaN1.09666.0NaNTropical fruits32090RR313188202021-07-0339.797.0
4605977838137AvocadoTropical fruit — Avocado — Ready to EatNaN1.09666.0NaNTropical fruits32090RR313188212021-07-0439.7910.0
4605978838137AvocadoTropical fruit — Avocado — Ready to EatNaN1.09666.0NaNTropical fruits32090RR313188222021-07-0539.793.0
4605979838137AvocadoTropical fruit — Avocado — Ready to EatNaN1.09666.0NaNTropical fruits32300RR313193042021-06-3039.794.0
4605980838137AvocadoTropical fruit — Avocado — Ready to EatNaN1.09666.0NaNTropical fruits32300RR313193052021-07-0139.797.0
4605981838137AvocadoTropical fruit — Avocado — Ready to EatNaN1.09666.0NaNTropical fruits32300RR313193062021-07-0239.796.0
4605982838137AvocadoTropical fruit — Avocado — Ready to EatNaN1.09666.0NaNTropical fruits32300RR313193072021-07-0339.7921.0
4605983838137AvocadoTropical fruit — Avocado — Ready to EatNaN1.09666.0NaNTropical fruits32300RR313193082021-07-0439.3920.0
4605984838137AvocadoTropical fruit — Avocado — Ready to EatNaN1.09666.0NaNTropical fruits32300RR313193092021-07-0538.698.0